feat(ssh): wire per-host sudo-mode learning into discovery/intelligence/liveness#576
Merged
Conversation
…ce/liveness Follow-up to PR #575 (auth-method learning). The same three paths that talk to a managed host still probed sudo mode from scratch every cycle — running a doomed `sudo -n` on a password-sudo host before retrying `sudo -S`. Extend the connprofile memo to the SUDO dimension so each path leads with the host's recorded mode and records the mode confirmed to work. Division of labour: - The liveness privilege probe is the AUTHORITATIVE sudo-mode learner: it runs an innocuous `true` sentinel every ~5 min, so it reliably confirms the mode regardless of any real command's exit code. - OS discovery (firewall probe) and OS intelligence (collector) learn OPPORTUNISTICALLY from their existing real sudo commands — no extra round-trip. To avoid misrecording, a mode is recorded ONLY on a confirmed exit-0 of a given sudo form, never inferred from a command that failed for its own reasons. Mechanics: - ssh.RunSudo (collector's shared primitive) gains a prefer (in) + observed (out): when prefer=SudoPassword and the password gate is satisfied it leads with `sudo -S`, skipping the doomed `sudo -n`; observed reports the confirmed form. Plain-string tokens (SudoNopasswd/SudoPassword) keep the ssh package decoupled from connprofile, matching PreferKey/PreferPassword. - discovery.runSudoWithFallback + probeFirewall thread prefer + return the learned mode; the discovery Service reads it (cfg.prefer) and records it. - sshprivilege.Probe extracts probeSudo: leads with `sudo -S` on a known password host, records the confirmed mode (preserving the AC-18/19/21 error shapes). - collector threads the recorded mode across the cycle's sudo commands and records once at the end. - cmd/openwatch wires the shared connprofile store into the collector and discovery services (the probe already had it from #575). The sudo password GATE (kill-switch + auth-method) is unchanged: leading with `sudo -S` is allowed only when a password may already be fed. Learning stays best-effort — a store miss/error escalates in the default order and never fails the connection; a stale mode self-heals (sudo -S miss falls back to sudo -n). Spec system-connection-profile -> v1.2.0: C-07, AC-10 (RunSudo primitive), AC-11 (discovery firewall probe), AC-12 (liveness probe).
remyluslosius
added a commit
that referenced
this pull request
Jun 16, 2026
Closes the project's biggest test blind spot: the dial, auth-ordering, and sudo -n/-S paths were only unit-tested at the command-construction level (stubbed transport), never against a real box. A wired-up host could regress and every test stay green. internal/ssh/livehost_test.go drives the REAL ssh.Dial + ssh.RunSudo — the primitives every host-talking path (scan, discovery, collector, liveness) shares — against an operator-supplied inventory: OPENWATCH_LIVE_HOSTS=/path/to/test_hosts.csv (hostname,ip,username,credential) OPENWATCH_LIVE_KEY=/path/to/id_rsa With either unset the test t.Skip()s, so it never gates normal CI; the inventory + key stay on the operator's workstation, never in the repo. The fleet is heterogeneous, so the test DISCOVERS each host's capabilities rather than demanding every method everywhere. Per host it asserts the machinery for whatever the host supports: - key auth dials -> ObservedAuth == "key" (the value the memo records) - password auth dials -> ObservedAuth == "password" - sudo mode confirmed via the `true` sentinel (nopasswd | password) - the real `sudo -S -k -p '' true` password-on-stdin path executes A server-side auth rejection (key not authorized, or PasswordAuthentication off) is a tolerated host-config fact; an unreachable host is skipped; only an unexpected protocol-level error or a wrong ObservedAuth/sudo result fails the test. A host with no usable auth is skipped. Validated against the dev fleet: 5 key+NOPASSWD hosts pass (real key dial, sudo -n, and sudo -S all exercised), key-rejecting and unreachable hosts skip. The password-AUTH assertion is live-unverified only because the dev fleet runs PasswordAuthentication=no everywhere (noted in BACKLOG); it runs as soon as one password-enabled host is in the inventory. Also drops the completed "wire SSH auth/sudo learning" backlog entry (shipped in #575 + #576).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Follow-up to #575 (which wired the SSH auth-method dimension). This wires the sudo-mode dimension of the
connprofilememo into the same three paths:internal/sshprivilege)internal/intelligence/discovery)internal/intelligence/collector)Each now leads sudo with the host's recorded mode (skipping the doomed
sudo -non a password-sudo host) and records the mode confirmed to work.Division of labour
truesentinel every ~5 min, so it reliably confirms the mode regardless of any real command's exit code.true-sentinel rule, C-04, binds only the scan; these paths defer to the liveness probe for authoritative confirmation.)Mechanics
ssh.RunSudo(collector's shared primitive) gainsprefer(in) +observed(out). Withprefer=SudoPasswordand the password gate satisfied it leads withsudo -S, issuing zerosudo -ncalls;observedreports the confirmed form. Plain-string tokens (SudoNopasswd/SudoPassword) keep thesshpackage decoupled fromconnprofile, mirroringPreferKey/PreferPassword.discovery.runSudoWithFallback+probeFirewallthreadpreferand return the learned mode; the discoveryServicereads it (cfg.prefer) and records it.sshprivilege.ProbeextractsprobeSudo: leads withsudo -Son a known password host, records the confirmed mode, and preserves the exact AC-18/19/21 error shapes.collectorthreads the recorded mode across the cycle's sudo commands and records once at the end.cmd/openwatchwires the sharedconnprofilestore into the collector + discovery services (the probe already had it from feat(ssh): wire per-host auth-method learning into discovery/intelligence/liveness #575).Safety
The sudo password gate (kill-switch + auth-method ∈ {password, both}) is unchanged — leading with
sudo -Sis allowed only when a password may already be fed. Learning is best-effort: a store miss/error escalates in the default order and never fails the connection; a stale mode self-heals (asudo -Smiss falls back tosudo -n).Spec / tests
system-connection-profile→ v1.2.0: adds C-07 and AC-10 (RunSudo primitive), AC-11 (discovery firewall probe), AC-12 (liveness probe).TestRunSudo_SudoModeLearning(lead-with, observe nopasswd/password, no-observation-when-ambiguous, stale-hint self-heal),TestProbeFirewall_SudoModeLearning,TestPrivilegeProbe_SudoModeLearning. Existing sudo/firewall tests updated for the new signatures + learning assertions.gofmt/go vet/go build ./...clean; touched-package suites green against the isolated test DB;specter check0 errors;system-connection-profile12/12 ACs have results, PASSES tier 2.With this, the auth-method and sudo-mode learning the compliance scan already had (#566) now covers all four host-talking paths.